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Abstract. Methods for the automatic composition of services into ex- 
ecutable workflows need detailed knowledge about the application do- 
main, in particular about the available services and their behavior in 
terms of input/output data descriptions. In this paper we discuss how 
the EMBRACE data and methods ontology (EDAM) can be used as 
background knowledge for the composition of bioinformatics workflows. 
We show by means of a small example domain that the EDAM knowl- 
edge facilitates finding possible workflows, but that additional knowl- 
edge is required to guide the search towards actually adequate solutions. 
We illustrate how the ability to flexibly formulate domain-specific and 
problem-specific constraints supports the workflow development process. 

1 Introduction 

The challenge of automatic workflow composition, in particular the automatic 
composition of bioinformatics web services into executable workflows, has been 
addressed by several projects in the past years (see, e.g., [1-5]). The importance 
of domain-specific knowledge about services and data types has been recognized 
long ago (see, e.g., [6]). However, most automatic workflow composition systems 
have so far relied on self-defined, special-purpose domain models, rather than 
using systematic information from a central instance (simply because no such 
instance existed). Some of the systems named above do in fact use knowledge 
from the BioMoby [7] ontologies (which apply the LSID [8] naming scheme), 
but these have mainly been derived from the (often incomplete or imprecise) 
meta-information that service providers submit during service registration at 
MobyCentral, and not systematically designed for depicting the structure of 
the whole bioinformatics domain. Quite recently the EDAM (EMBRACE Data 
And Methods) ontology [9] has been initiated with the aim of building a unified 
vocabulary about bioinformatics services and data, suitable for bridging the 
gap between mere service registries and semantically aware service composition 
methodologies. 



In this paper we show how the domain knowledge that EDAM provides can 
be used as basis for the automation of service composition. After an introduc- 
tion to the principles of automatic workflow composition and to the software 
framework that we used for this study (Section 2) we describe the EDAM on- 
tology (Section 3), especially focussing on the parts that are relevant for auto- 
matic workflow composition. In Section 4 (Results and Discussion) we present an 
example domain and workflow composition problem, showing that the EDAM 
knowledge facilitates finding possible workflows, but that additional knowledge 
is required to limit the search to the actually desired or adequate solutions. We 
use the PROPHETS synthesis framework, which facilitates a very flexible way of 
expressing additional knowledge: it can either be specified during domain mod- 
eling (especially suitable for domain-specific constraints) or during the actual 
synthesis (especially suitable for problem-specific constraints) . In this way, users 
can flexibly interact with the workflow development framework, collecting pos- 
sible solutions and continuously refining the constraints according to any arising 
requirements. The paper ends with a conclusion in Section 5. 

2 Automatic Workflow Composition 

The automatic composition of (small) software units into (large) runnable pieces 
of software has been subject to research for several years (see, e.g. [10-18]). Con- 
temporary terminology, for instance automatic workflow composition or auto- 
matic service composition, reflects the current trend towards service orientation 
and (business) process modeling. Nevertheless, the principles of the composition 
methodologies on which they are based remain the same. 

The commonly known methods can roughly be distinguished into synthesis- 
based and planning-based approaches: Synthesis methods usually are rather 
behavior-oriented (e.g., the temporal- logic-based methods of [13,16]), whereas 
classical planning algorithms are more focused on the availability of resources 
and definition of a world state with predicates, which can be modified by the 
actions (cf. [19, chapter 10]). In addition, several hybrid approaches, such as 
LTL-enhanced planning [20, 17], exist that incorporate aspects of the respective 
other method. Despite all the differences with regard to the above named strate- 
gies and algorithms, all approaches share an essential characteristic: they search 
for paths in some kind of "universe" that is given by the information about the 
services and in particular their input/output behavior. 

Accordingly, the quality of the solutions crucially depends on the quality of 
the available information about the services of the domain (more than on the 
actual search strategy): Merely syntactic information (as provided by most pro- 
gramming language APIs or standard web service interfaces) is not sufficient. 
Proper semantic descriptions of services and data types are required to obtain 
meaningful results. For instance, assume a web service that returns a job ID, 
and another service that consumes a nucleotide sequence. From a programming 
language point of view, both would be classified as character sequences (strings) 
and would thus be assumed to match. While this is syntactically correct, sub- 



mitting a job ID to a sequence analysis service would of course not lead to the 
desired results, so a more precise - semantic - interface description is required. 

Automatic Workflow Composition in Bio-jETI 

For the work presented in this paper, we used the PROPHETS extension of 
the Bio-jETI framework [21]. PROPHETS seamlessly integrates automatic ser- 
vice composition methodology into Bio-jETI, in particular by supporting visu- 
alized/graphical semantic domain modeling, loose specification within the work- 
fiow model, non-formal specification of constraints using natural language tem- 
plates, and automatic generation of model checking formulas (to check global 
properties of workfiows). For a detailed introduction into the framework and the 
underlying ideas the reader is referred to [22,23]. In a nutshell, working with 
PROPHETS incorporates two phases: 

1. During domain modeling meta-information about the available services is 
collected and provided in appropriate form, and semantic classifications of 
the services and their input and output types are defined. The service and 
type taxonomies are stored in OWL format, where the OWL classes rep- 
resent abstract classifications, and the actual types and services are then 
represented as individuals that are related to one or more of those classifi- 
cations by instance- of relations. 

2. In the actual workflow design phase this domain is used to (automati- 
cally) create the workfiow models. Central is the idea of loose specification: 
branches in the model can be marked as loosely specified^ which are then 
automatically replaced by adequate composite services. 

The algorithm [24] that is then applied to complete a loosely specified work- 
fiow to be fully executable takes two orthogonal aspects into account: On the 
one hand, the workflow must be type-consistent to be executable, on the other 
hand, the constraints specifled by the workflow designer must be met: 

— The configuration universe constitutes the algorithm's basic search space. 
It contains all valid execution sequences and is implicitly defined by the 
domain model. As this configuration universe is usually very large, it is not 
explicitly generated from the domain definition, but on the fly during the 
synthesis process. 

— The specification formula is the second aspect. It describes all sequences of 
services that meet the individual workfiow specification, but without tak- 
ing care of actual execut ability concerns. As the explicit representation of 
all these sequences would be extremely large, the formula is not explicitly 
built, but given as a declarative formula in SLTL (Semantic Linear Time 
Logic) [24] , a variant of the commonly known propositional linear-time logic 
(PLTL). The formula is a conjunction of all available constraints, comprising: 

• technical constraints (configurations of the synthesis algorithm), 

• domain-specific constraints (globally specified by the domain modeler), 
and 



• problem-specific constraints (derived from the loosely specified branch 
or specified by the workflow designer). 

To start the search for solutions, the synthesis algorithm requires an initial 
state (i.e. a set of start types). They are determined automatically according to 
the output specification of the service at the beginning of the loose specifica- 
tion. Based on this the synthesis algorithm performs a parallel evaluation of the 
configuration universe and the specification formula to search for paths that are 
consistent with both the configuration universe and the SLTL formula. Each of 
those paths is a valid concretization of the loosely specified branch. 

3 EDAM as Background Knowledge 

The EDAM (EMBRACE Data And Methods) ontology^ has been developed 
in the scope of the EMBRACE (European Model for Bioinformatics Research 
and Community Education) project^ as an ontology for describing life science 
web services [9]. In contrast to many known ontologies like the Gene Ontology 
[25] or the majority of the Open Biomedical Ontologies [26], who focus on the 
description of biological content, it provides a vocabulary of terms and relations 
that can be used for annotating services with useful (semantic) metadata, in 
particular regarding their behavior and their inputs and outputs. Two important 
applications of EDAM have already been identified in [9] : the use of the defined 
terms for semantic annotations of web services (e.g. via SAWSDL extension 
attributes [27]) in order to facilitate service discovery and integration, and the 
more detailed description of the involved data types in order to improve the data 
exchange between services. 

Strictly speaking, EDAM is not a single, large ontology, but consists of six 
separate (sub-) ontologies: 

1. Biological entity: physically existing (parts of) things, such as SNPs, al- 
leles, or protein domains. 

2. Topic: fields of bioinformatics study, such as nucleic acid sequence analysis, 
model organisms, or visualization and rendering. 

3. Operation: particular functions of tools or services (e.g., web service opera- 
tions) , such as annotation, codon usage analysis, or sequence database search 
by motif or pattern. 

4. Data resource: various kinds of data resources, such as Biological pathways 
data resource. Literature data resource, or Ontology data resource. 

5. Data: semantic descriptions of data entities that are commonly used in 
bioinformatics, such as BioCyc enzyme ID, phylogenetic consensus tree, or 
sequence alignment. 

6. Data format: references to (syntactic) data format specifications, such as 
Clustal sequence format, PubMed Central article format, or HMMER hidden 
Markov model format. 

^ http: //edamontology . sourcef orge .net/ 
^ http://www.embracegrid.info/ 



Importantly, EDAM is not a catalogue of concrete services, data, resources 
etc., but a provider of terms for classification of such entities - exactly what 
is meant by "background knowledge" . Thus, using EDAM is simply taking the 
(relevant parts of the) ontology and sorting the application-specific resources 
into this skeletal domain structure. In the following we use the Operation (sub-) 
ontology as vocabulary for service classifications, and the Data (sub-) ontology 
for data type classifications. Employing also the Data format part in order to 
take into account the more technical interface specifications is subject of ongoing 
work. (For this study we used EDAM version betaOS.) 

4 Results and Discussion 

Using EDAM as background knowledge in the domain model for synthesis with 
PROPHETS, the domain setup involves three major steps: 

1. Converting EDAM from OBO (Open Biomedical Ontologies) format into 
OWL format (using the Protege OWL API). 

2. Generating the service taxonomy from the Operation term and (transitively) 
all its subclasses, and the type taxonomy from the Data term and (transi- 
tively) all its subclasses. 

3. Sorting the available services and their input /output types into the service 
and type taxonomy, respectively. 

Steps 1 and 2 can be executed fully automatically. Step 3 can be automated if 
EDAM annotations are available for the types and services, as it is the case, for 
instance, for EMBOSS [28], which provides EDAM relations in its Ajax Com- 
mand Definition (ACD) files. In the following we will use a smaller example 
domain that comprises a set of bioinformatics tools for which the taxonomic 
classifications have been defined manually. It uses only a small part of EDAM, 
but it is already large enough to illustrate our results. 

Table 1 lists the services in the example domain, along with their input and 
output data types. Concrete service and data type names are given in normal 
font, while (abstract) services and types in terms of the EDAM ontology are 
given in italics. Note that the service interface descriptions in this domain are 
quite simple: each one has at most one input and one output, whereby only 
user data (i.e. input and output files) are considered, while other parameters 
(primarily used for configurations that are not actually inputs) are not taken 
into account. 

Figures 1 and 2 show the service and type taxonomies (screenshots from 
PROPHETS' built-in ontology editor). The classes (the blue squares) correspond 
to the Operations and Data (sub-) ontologies of EDAM, respectively, but have 
been cut down to the classes that are relevant for the services and data that are 
used in the domain model. The purple rhombs represent the concrete services 
and types of the domain that have been added to the domain model as instances 
of the respective EDAM classes. 



Table 1. Domain model: service descriptions. 



Service 


Behavior 


ClustalW 

Global multiple sequence alignment 


In: Sequence 

Out: Multiple sequence alignment 


ClustalW2 

Global multiple sequence alignment 


In: Sequence 

Out: Multiple sequence alignment 


DBFetch_FetchBatch 
Database search and retrieval 


In: Sequence identifier 
Out: Sequence 


DBFetch_FetchData 
Database search and retrieval 


In: Sequence identifier 
Out: Sequence 


Gblocks 

Sequence alignment conservation analysis 


In: Multiple sequence alignment 
Out: Multiple sequence alignment 


KAlign 

Global multiple sequence alignment 


In: Sequence 

Out: Multiple sequence alignment 


Mafft 

Global multiple sequence alignment 


In: Sequence 

Out: Multiple sequence alignment 


Muscle 

Global multiple sequence alignment 


In: Sequence 

Out: Multiple sequence alignment 


PliyML_AminoAcid 

Phylogenetic tree construction 
from molecular sequences 


In: Protein Sequence 
Out: Phylogenetic tree 


PhyML_DNA 

Phylogenetic tree construction 
from molecular sequences 


In: DNA sequence 
Out: Phylogenetic tree 


poptree_NJ 

Phylogenetic tree construction 
(minimum distance methods) 


In: Sequence composition 
Out: poptree_outfile 


poptree_UPGMA 
Phylogenetic tree construction 
(minimum distance methods) 


In: Sequence composition 
Out: poptree_outfile 


postree 

Phylogenetic tree drawing 


In: poptree_outfile 

Out: Phylogenetic tree image 


predator 

Protein secondary structure prediction 


In: Protein sequence 

Out: Protein secondary structure 


ps2pdf 


In: Image 
Out: Image 


ReadFile 
File loading 


Out: Data 


ReadDNASequence 

File loading 


Out: DNA sequence 


TCoffee 

Global multiple sequence alignment 


In: Sequence 

Out: Multiple sequence alignment 


WriteFile 


In: Data 


WUBlast 

Sequence database search by sequence 
(word-based methods) 


In: Sequence 

Out: Sequence database hits 


Viewer 

Visualisation and rendering 


In: Data 


WUBlastParser 


In: Sequence database hits 
Out: Sequence identifier 
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Fig. 1. Service taxonomy. 




Fig. 2. Type taxonomy. 
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Fig. 3. Loosely specified workflow and some possible concretizations. 



Figure 3 shows the simple loosely specified workflow that we used as ba- 
sis for experimentation with the described domain model, along with five pos- 
sible concretizations. The workflow begins with reading a DNA sequence file 
(ReadDNASequence) and ends with displaying a result (Viewer). These compo- 
nents are connected by a loosely specified branch (colored in red). Accordingly, 
the basic synthesis problem is to find a workflow that takes a ReadDNASe- 
quence's output (a DNA sequence) as input, and produces Viewer's input (some 
data) as output. Note that this setup is in particular appropriate for experimen- 
tation with the domain, exploring what workflows are generally possible with 
certain input data. In many realistic applications, the target of the loosely spec- 
ified branch is a more specific component, for instance an alignment viewer, so 
that the synthesis problem as such becomes more specific, too. 

The five concretizations shown in Figure 3 are of course not the only possible 
ones. As we will detail in the following, thousands if not millions of solutions are 
easily possible with the described domain model, but they are not necessarily 
desired or adequate. We will show in the following how playing with configura- 
tions and constraints may help mastering this enormous potential by excluding 
inadequate or bundling equivalent solutions. 



In a first experiment, we executed tlie syntliesis witli no furtlier constraints. 
Tiiat is, only tiie input /output specification defined by tiie loosely specified 
branch was considered and a "naive" search was performed on the synthesis 
universe. When we used the basic configuration of the synthesis algorithm, we 
obtained 264,118 possible solutions for the synthesis problem already for a search 
depth of 5. Additionally using a solution filtering mechanism that removes ser- 
vice sequences that are mere permutations of others decreased the number of 
solutions to 5,325. As the example domain is characterized by consisting of ser- 
vices that have at most one input and one output data type, it turned out to be 
most efficient, however, to use a configuration in which the synthesis also mimics 
a "pipelining" behavior. This means that output data is only transferred to the 
direct successor of a service in the workflow and in particular not available for 
any other subsequent services. This configuration led to 2269 solutions in the 
unconstrained case. 

In the following we will develop a small set of constraints that is already 
sufficient to drastically reduce the solution space further by excluding clearly 
inadequate solutions. The constraints are based on a few general observations 
about the solutions obtained so far: 

1. Workflows that contain services that make no contribution to solving the 
synthesis problem. Such services are, for example, ReadFile (requiring no 
input but producing arbitrary new data that is planted into the workflow 
and can distract from the actual synthesis problem) or WriteFile (consuming 
data but without producing a new analysis result). 

2. Workflows that contain services that make no progress within the work- 
flow. Particular examples are redundant service calls, such as a the multiple 
invocation of Gblocks on a multiple sequence alignment where only the first 
call does make sense. 

3. Workflows that contain "dead" functionality in the sense that certain 
service outputs are not adequately used. For instance, a BLAST result is 
usually not used as such. Rather, the sequence identifiers of the BLAST hits 
are of interest in order to retrieve the corresponding entries from a database. 
Thus, to make sense, workfiows that call BLAST should also contain a call 
to a BLAST parser that gets the IDs from the BLAST result. 

4. Workfiows that contain several services that are useful when seen individu- 
ally or in parts of the solution, but the overall workfiow is not the envis- 
aged analysis. For example, a solution where an alignment service (such 
as ClustalW) is called with the input data is certainly a possible and useful 
workfiow, but if the workfiow developers' intention was actually to do the 
alignment with homologous sequences of the input sequences, or to obtain a 
phylogenetic tree from the input, this solution is not adequate. 

In order to address the first observation ("services that make no contribu- 
tion" ) we defined a constraint that excludes all corresponding services of our ex- 
ample domain (i.e. ReadFile, ReadDNASequence, WriteFile, and Viewer) from 
the synthesis. This can easily be realized via the SLTL formula 

G{^{S)true) 



which prohibits any occurrence of S in a solution. Since the exclusion of par- 
ticular services is a quite common constraint, it is provided as a template in 
the PROPHETS Synthesis Wizard, which only requires the user to list all 'un- 
wanted' services. Applying this template to all four services named above yields 
a constraint that in itself reduced the number of solutions from 2,269 to 55. 

Adding an SLTL constraint for the second observation ("services that make 
no progress") in a similar fashion, specifying that Gblocks should not never be 
called twice (or more), decreases the number of solutions 49. This number can be 
further reduced by an SLTL constraint exploiting the third observation ("dead 
functionality" ) : requiring that a call of WUBlast should be followed by a call to 
the corresponding parser eliminates another 18 solutions. 

In order to address the last observation ( "workflow is not the envisaged anal- 
ysis"), we provided the synthesis with constraints that express general ideas 
about the desired workflow. As one example, a frequently performed analysis for 
molecular sequences is to search for similar sequences in a database and com- 
pare the new sequences to the known ones by means of a sequence alignment. 
Accordingly, we formulated a constraint that enforces a sequence database search 
by sequence and after that a multiple sequence alignment as part of the solution. 
This decreased the number of solutions further down to 24. The remaining so- 
lutions are similar to the two workflows at the bottom of Figure 3: they start 
with WUBlast and the WUBlastParser, followed by either DBFetch_FetchBatch 
or DBFetch_FetchData, after which one of the alignment algorithms is called. 
Half of the solutions appended Gblocks as final service. 

Another analysis that is often applied to molecular sequences is the con- 
struction of a phylogenetic tree, which represents the evolutionary relationship 
between the sequences. As an alternative to the above constraint for the fourth 
observation, we formulated a constraint that enforces the use of a phylogenetic 
tree construction service. The only remaining result when this constraint is ap- 
plied together with constraints 1, 2 and 3 is a one- step solution consisting of the 
PhyML_DNA service, which computes a phylogenetic tree from a set of DNA 
sequences. 

Table 2 summarizes our findings, containing also all other combinations of 
the constraints developed above: The first constraint (excluding services that 
make no contribution towards solving the synthesis problem) has the strongest 
impact on the number of solutions, whereas the second constraint leads only to 
an exclusion of around 80 solutions. Notably, already constraints 1 and 4 together 
decrease the number of solutions to 24, i.e. to the set of solutions that was also 
obtained with the first four constraints together. Likewise, constraints 1 and 4' 
together already lead to the single-step solution that was initially obtained by a 
combination of four constraints. 

As the table furthermore shows, some combinations of constraints, in fact 
most combinations involving 4 and 4', leave no solutions at all. This may indicate 
that the constraints are inconsistent (i.e. do not have any solution), or, as in this 
case, that they require solutions of length greater than the current search depth 



Table 2. Overview of results (considering solutions found until a search depth of 5). 



Constraints 


Visited nodes 


Solutions 




Constraints 


Visited nodes 


Solutions 


none 


34,026 


2,269 




1, 2, 3 


9,603 


31 


1 


1,139 


55 




12 4 


8,057 


24 


2 


82,343 


2,194 




1, 2, 4' 


2,084 


1 


3 


132,809 


1,916 




1, 3, 4 


28,545 


24 


4 


436,102 


471 




1, 3, 4' 


18,699 





4' 


129,200 


406 




1, 4, 4' 


15,919 





1,2 


1,103 


49 




2, 3,4 


919,162 


138 


1,3 


3,123 


52 




2, 3, 4' 


284,463 


347 


1,4 


8,309 


24 




2, 4, 4' 


859,047 


18 


1,4' 


2,336 


1 




3, 4, 4' 


1,752,153 





2, 3 


138,137 


1,847 




1, 2, 3, 4 


28,545 


24 


2,4 


443,860 


459 




1,2, 3, 4' 


2,084 


1 


2, 4' 


181,365 


394 




1, 2, 4, 4' 


15,235 





3,4 


910,672 


138 




1, 3, 4, 4' 


54,711 





3, 4' 


277,239 


359 




2, 3, 4, 4' 


1,764,843 





4, 4' 


847,845 


18 




all 


54,027 






of 5. In the latter case, increasing the search depth may be the solution of choice, 
but there are two options which apply in both cases: 

— loosening the constraints, in order to resolve the inconsistencies or to allow 
shorter solutions, and 

— revise/extend the domain model in order to increase the solution space. 

Whereas playing with the first option may well be in the competence of the 
workflow designer, the second option requires domain modeling expertise. 

The second column of the table gives the number of nodes that are visited 
by the synthesis algorithm during its iterative deepening depth-first search. The 
numbers reflect that the synthesis search space is not only constituted by the 
(static) service descriptions, but also by the additionally provided logical con- 
straints (cf. Section 2). In fact, the search space may grow with the product of 
the sizes of the formula and the constraints. As the table shows, in our case con- 
straints can both decrease and increase the search space: In the unconstrained 
case, 34,026 nodes are visited, the least number of nodes (1,103) is visited when 
constraints 1 and 2 are used, and with constraints 2, 3, 4, and 4' a total of 
1,764,843 nodes are visited until a search depth of 5. 

Note that not all constraints we defined here are useful in every setting. 
For instance, services that generate data "from scratch" may well be useful 
when there are other services involved that need this additional input. Thus, in 
cases where producing entirely new data is considered a useful or even necessary 
feature, the first constraint needs to be relaxed. Thus changing the considered 
problem/purpose may in addition to adapting the purpose-specific constraints 
(here 4 and 4') also require to reconsider all the other constraint classes. 



5 Conclusion 



Methods for the automatic composition of services into executable workflows 
need detailed knowledge about the application domain. In this paper we dis- 
cussed how the EMBRACE data and methods ontology (EDAM) can be used 
as background knowledge for the composition of bioinformatics workflows. We 
found that the EDAM knowledge facilitates finding possible workfiows, but 
that additional knowledge is required to limit the search to the actually de- 
sired/adequate solutions: 

— EDAM provides a controlled vocabulary for data and methods in the bioin- 
formatics domain, whereby it covers in particular the service and data type 
description terminology that is needed for the automatic composition of 
services into workflows. Given that the services in the domain and their in- 
put/output types are properly annotated in terms of EDAM, this knowledge 
ensures that the synthesis algorithms find adequate workfiows. 

— However, finding the actually desired workfiows requires more knowledge. In 
this paper we provided simple examples of additional constraints, such as 
exclusion of particular services, dependencies between services, and general 
patterns for a desired solution. 

Larger service collections like EMBOSS [28] or the BioCatalogue [29], which 
provide hundreds or even thousands of services, are not manageable without sys- 
tematic discovery or service composition techniques. When dealing with small 
domains, human experts may be unbeatable in composing tailored workfiows, 
but firstly not every human is an expert in bioinformatics services and data 
types, and secondly also experts cannot always keep track of all changes in large 
domain libraries. Thus both experts and average users may profit from tools 
that automatically exploit arising domain-specific and problem-specific knowl- 
edge beyond the usual "static" domain model (i.e. service interface descriptions). 

The PROPHETS synthesis framework enables a very fiexible way of express- 
ing additional knowledge: it can either be specified during domain modeling 
(especially suitable for domain-specific constraints) or during the actual synthe- 
sis (especially suitable for problem-specific constraints). Thus, the expression of 
(additional) domain knowledge is, in particular, cleanly separated from the im- 
plementation of the synthesis algorithm. This is in contrast to other approaches 
to automatic composition of bioinformatics services that we are aware of (such 
as [1, 4, 5]), which rely on the knowledge that is provided by the service and data 
type descriptions and ontological classifications, and where all additional domain 
knowledge (if any) is hidden in specifically designed composition algorithms. 

Currently we are exploring the bioinformatics domain further in order to 
identify general domain-specific constraints, and problem-specific constraints es- 
pecially in the shape of often recurring workfiow patterns. Such a library of 
constraints that can be added and removed dynamically during the workfiow 
development process will enable users to work and experiment with the domain 
in a very fiexible manner by tailoring their solution space on demand. This 
kind of experimentation provides users with an easy entry into this complex 



landscape of tools and technologies, and later with means for scalability: exper- 
imenting with options and constraints it is possible to tailor the setting in a 
way that at the same time improves the adequacy of the specification as well as 
the search depths for solutions of the synthesis procedure. We plan to support 
this approach by extending the domain knowledge with domain-specific search 
heuristics enhancing the synthesis algorithms. 
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